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This article presents maximum likelihood estimators (MLEs) and 
log-likelihood ratio (LLR) tests for the eigenvalues and eigenvectors of 
Gaussian random symmetric matrices of arbitrary dimension, where 
the observations are independent repeated samples from one or two 
populations. These inference problems are relevant in the analysis 
of diffusion tensor imaging data and polarized cosmic background 
radiation data, where the observations are, respectively, 3x3 and 
2x2 symmetric positive definite matrices. The parameter sets in- 
volved in the inference problems for eigenvalues and eigenvectors are 
subsets of Euclidean space that are either afhne subspaces, embedded 
submanifolds that are invariant under orthogonal transformations or 
polyhedral convex cones. We show that for a class of sets that includes 
the ones considered in this paper, the MLEs of the mean parameter 
do not depend on the covariance parameters if and only if the covari- 
ance structure is orthogonally invariant. Closed- form expressions for 
the MLEs and the associated LLRs are derived for this covariance 
structure. 

1. Introduction. Consider the signal-plus-noise model 
(1) Y = M + Z, 

where Y,M,Z G Sp, the set oi p x p symmetric matrices {p> 2). Here M 
(capital fi) is a mean parameter and Z is a mean-zero Gaussian random 
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matrix. Our goal is to estimate and test hypotheses about M when it is 
restricted to subsets of Sp defined in terms of the eigenvalues and eigen- 
vectors of M. In particular, we are interested in testing whether M has a 
fixed set of eigenvalues or a fixed set of eigenvectors, taking into account 
ordering, and whether the eigenvalues of M have particular multiplicities. 
We consider both the one- and two-sample problems, where repeated inde- 
pendent observations of Y are sampled from one population with mean M 
or two populations with means Mi and M2, respectively. In the two-sample 
problem, the pairs (Mi,M2) are restricted to subsets of Sp x Sp defined in 
terms of the eigenvalues and eigenvectors of Mi and M2. Here we are inter- 
ested in testing whether Mi and M2 have the same eigenvalues or the same 
eigenvectors, taking into account the possibility of the eigenvalues having 
particular multiplicities. 

This problem contains many of the ingredients of a classical multivariate 
problem. What makes it interesting and relevant are the following two as- 
pects of it. First, new massive data sources have appeared recently where 
the observations take the form of random symmetric matrices, for which 
model (1) is appropriate. Second, the mean parameter sets involved in the 
inference problems for eigenvalues and eigenvectors have interesting geome- 
tries as subsets of Sp and Sp x Sp-. affine Euclidean subspaces, orthogonally 
invariant embedded submanifolds and convex polyhedral cones. Deriving 
maximum likelihood estimators (MLEs) in these sets is nontrivial as the 
multiplicities of the eigenvalues of M affect both the dimension of the set 
and whether the true parameter falls on the boundary of the set. The effect 
of the covariance structure also depends on the geometry of the set. We show 
in Theorems 3.1 and 3.2 that for a particular class of sets that includes all 
the sets considered in this paper, the MLEs do not depend on the covariance 
parameters if and only if Z in (1) has an orthogonally invariant covariance 
structure. Closed-form expressions for the MLEs and LLRs are then derived 
for this particular covariance structure in Propositions 4.1 and 4.2, and in 
Theorems 4.1, 4.2, 5.1 and their corollaries. 

Examples of random symmetric matrix data are found in diffusion tensor 
imaging (DTI) and polarized cosmic background radiation (CMB) measure- 
ments. DTI is a modality of magnetic resonance imaging that produces at 
every voxel (3D pixel) a 3 x 3 symmetric positive definite matrix, also called 
a diffusion tensor (Basser and Pierpaoli [4], LeBihan et al. [20]). The dif- 
fusion tensor describes the local pattern of water diffusion in tissue. In the 
brain, it serves as a proxy for local anatomical structure. The tensor eigen- 
values are generally indicative of the type of tissue and its health, while 
the eigenvectors indicate the spatial orientation of the underlying neural 
fibers. Since the anatomical information is contained in the eigenstructure 
of the diffusion tensor, inference methods for the eigenvalues and eigenvec- 
tors of the tensor are useful in anatomical imaging studies. The one- and 
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two-sample testing problems arise when comparing images voxelwise to a 
fixed anatomical atlas or between two groups of subjects. 

In astronomy, the polarization pattern of the CMB can be represented by 
2x2 symmetric positive definite matrices; here again the information is con- 
tained in the eigenstructure. The eigenvalues relate to the electromagnetic 
field strength while the eigenvectors indicate the polarization orientation 
(Hu and White [15], Kogut et al. [17]). The one-sample estimation problem 
arises in image interpolation, while two-sample testing problems arise when 
comparing regions of the sky. 

DTI data has been shown to be well modeled by (1) (Basser and Paje- 
vic [3]). In this case the pattern of diffusion in the tissue is captured by 
M, while Z captures the variability in the measurements. The Gaussian- 
ity assumption for Z holds for high signal-to-noise ratio (SNR) and the 
orthogonal invariance property holds for orthogonally invariant field gradi- 
ent designs. In the DTI literature, model (1) has also been used to model 
the data after a matrix log transformation (Arsigny et al. [1], Fletcher and 
Joshi [13], Schwartzman [30]). The matrix log, computed by taking the log 
of the eigenvalues and keeping the eigenvectors intact, maps the observed 
positive definite matrices to the set 5p. This ensures that, when transformed 
back, the estimated matrices are always positive definite, as required by the 
anatomy. Whether the log transform should be applied is a subject of cur- 
rent debate (Whitcher et al. [33]). The methods developed in this paper are 
applicable in either case because the matrix log affects only the eigenvalues 
in a one-to-one fashion, so the various hypotheses about eigenvalues and 
eigenvectors can be equivalently stated in both domains. 

The inference problems for the eigenvalues and eigenvectors of M in (1) 
are nonstandard in that many of them involve nonlinear parameter sets. 
Suppose M is constrained to lie in a generic subset C or M.2 C Sp x 
Sp. The existence of MLEs and their consistency are guaranteed for closed 
parameter sets (Wald [32]), but their asymptotic distributions depend on 
the geometry of these sets. The cases considered here involve three kinds of 
sets: 

• Affine subspaces of Sp and SpxSp, that is, translations of a linear subspace 
by a constant Mq S Sp or Mq £ SpX Sp. In these cases the MLE is unique 
and exactly multivariate normal. 

• Embedded submanifolds of Sp and Sp x Sp defined by a set of constraints, 
much in the same way that the set Op oi p x p orthogonal matrices is 
defined by the set of constraints Q'Q = QQ' = Ip, where Ip is the p x 
p identity matrix. If the submanifold is closed or if the true parameter 
is in the interior of the set, then the MLE is unique almost surely and 
asymptotically multivariate normal on the tangent space to the manifold 
at the true parameter (Efron [11]). 
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• Closed convex cones in Sp, that is, closed convex sets C where there exists 
a vertex Mq such that X ^ C implies a{X — Mq) + Mq G C for any real 
nonnegative a. In this case the MLE is also unique almost surely. The 
asymptotic distribution of the MLE is normal if the true parameter M is 
inside the cone and is the projection of a normal onto the support cone at 
the true parameter if the true parameter is on the boundary of the cone 
(Self and Liang [31]). 

In testing problems, the distribution of the LLRs associated with the 
MLEs depends on the geometries of the null set A^o and the alternative set 
Ma- If the sets are nested and both are closed embedded submanifolds, then 
as n ^ oo, the LLR is asymptotically with number of degrees of freedom 
equal to the difference in dimension between the null and the alternative 
(e.g., Mardia, Kent and Bibby [23], page 124, Lehman [21], page 486). If 
the null set is a cone that is not a submanifold and the alternative is unre- 
stricted, then the LLR is a mixture of with various numbers of degrees 
of freedom according to the dimensions of the faces of the cone and with 
weights that depend on the angles between the faces (Self and Liang [31]). 
The multiplicities of the eigenvalues of M play a crucial role as they deter- 
mine the dimension of the hypotheses. For example, if a LLR test statistic 
is derived assuming specific multiplicities but the true parameter has differ- 
ent multiplicities, then the LLR no longer has the prescribed asymptotic 
distribution (Drton [8]). For this reason, we assume throughout this paper 
that the multiplicities of the eigenvalues at any particular hypothesis are 
fixed and known. This is a reasonable assumption in DTI data, as different 
tissue types typically correspond to diffusion patterns with characteristic 
eigenvalue multiplicities, often called isotropic, prolate, oblate, and fully 
anisotropic diffusion (Zhu et al. [34]). 

The normal distribution for symmetric random matrices has been known 
in statistics for over half a century. The orthogonally invariant covariance 
structure dates at least as far back as the work of Mallows [22] , who presented 
it in its most general form. James [16] obtained a spherically symmetric ver- 
sion of this distribution as a particular limit of the Wishart distribution and 
applied it in a study of the ordered characteristic roots of random matrices. 
Another form of the normal distribution for symmetric matrices has been 
studied as a special case of the normal distribution for rectangular random 
matrices (Gupta and Nagar [14]). The covariance in this case is defined 
in terms of the rows and columns of the matrix and is not necessarily or- 
thogonally invariant. A connection between this distribution and James' is 
made by Chikuse [7]. In modern random matrix theory, the most common 
Gaussian distribution for symmetric matrices is the Gaussian Orthogonal 
Ensemble (GOE), which was developed independently in the physics liter- 
ature (Mehta [24]). The covariance structure of the GOE is orthogonally 
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invariant and is a special case of the general orthogonally invariant covari- 
ance structure originally introduced by Mallows [22]. 

Despite the historical presence of the Gaussian distribution for symmetric 
matrices, little can be found in the statistics literature about inference for 
eigenvalues and eigenvectors when samples are drawn from that distribution, 
at least for fixed p. The only reference we have found is the work by Mallows 
[22], who studied linear hypotheses involved in testing the eigenvectors of a 
single matrix. Our expressions for the MLEs and the corresponding LLRs 
are derived assuming the same orthogonally invariant covariance structure 
described there. The results are stated in Propositions 4.1 and 4.2, and in 
Theorems 4.1, 4.2 and 5.1, and their corollaries. We show in Theorems 3.1 
and 3.2 that the orthogonally invariant covariance structure is precisely the 
type of covariance needed to make the problem of finding MLEs independent 
of the covariance parameters for a class of parameter sets that include all 
the sets considered in the present article. Proposition 3.1 provides a LLR 
test for checking whether the orthogonally invariant covariance structure is 
appropriate for any particular data set. 

The theory of curved exponential families is useful for finding the asymp- 
totic distribution of the MLEs on embedded submanifolds. Interestingly, 
there is a close resemblance between the problem of estimating the mean of 
a normal symmetric matrix when its eigenvalues are fixed and Fisher's clas- 
sical circle problem of a bivariate normal with mean parameter constrained 
to a circle of fixed radius (Efron [11]). Other useful tools are optimization 
techniques on Op (Chang [5], Edelman, Arias and Smith [10]) and algorithms 
for estimation over parameter sets with edges (e.g., Chernoff [6], Self and 
Liang [31]), and sets with linear constraints (e.g., Lawson and Hanson [19], 
Dykstra [9]). 

The inference problems considered in this paper are motivated by the 
analysis of DTI data. There are many combinations of inference problems for 
eigenvalues and eigenvectors depending on the restrictions that are imposed 
on the parameters. Section 2 provides an overview of the different cases. To 
consolidate the results, we first derive general results regarding the MLEs 
and LLRs for the orthogonally invariant symmetric-matrix-variate normal 
distribution in Section 3. We then apply these results to the various specific 
problems for eigenvalues and eigenvectors in Sections 4 and 5. 

2. Inference for eigenvalues and eigenvectors: overview. 

2.1. One-sample problems. The one-sample problem in DTI data is use- 
ful when comparing an individual or a small group of individuals to a fixed 
anatomical atlas. In this comparison, the investigator may be interested in 
making inferences about the full tensor M, only about its eigenvalues, or 
only about its eigenvectors. The cases considered here are summarized in 
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Fig. 1. Tests and parameter sets in the one-sample problem. Each test is represented by 
an arrow, where the origin of the arrow indicates the null hypothesis and the end of the 
arrow indicates the alternative hypothesis. The dashed borders indicate sets for which the 
MLE has no closed-form. The orthogonal and parallel notation is explained in Definition 
3.1, Section 3.2. 

Figure 1. Of the seven tests, three involve affine subspaces, one involves a 
convex cone, and three involve orthogonally invariant submanifolds. All the 
sets involved satisfy the conditions of Theorem 3.1 below, so that the MLE 
of M does not depend on the covariance parameters when the covariance 
structure is orthogonally invariant. We begin with the tests involving affine 
subspaces. 

(AO) Unrestricted test. Here no particular attention is paid to the eigen- 
structure. The test is Hq:M = Mq vs. Ha-M ^Mq. 

(Al) Test of eigenvalues with known unordered eigenvectors. This test is 
useful for assessing differences in eigenvalues, while the eigenvectors are 
fixed from the atlas. Denote by T>p the set oi p x p diagonal matrices. 
The test is Ho:M = Mq = UoDqUq vs. Ha: M gMuq, where 

(2) Muo={M = UoDU{,:DeVp} 

for fixed Uq £ Op. This is equivalent to testing whether the eigenvalues 
D = UqMUq of M are different in value or in order from the eigenvalues 
-Do = UqMqUo of Mq. The fixed eigenvectors Uq are assumed unordered 
in the sense that they represent only a coordinate system, thus allowing 
the eigenvalues to change their order under the alternative. Moreover, 
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no restrictions are placed as to whether there are multiphcities in the 
eigenvalues of D. These assumptions make Muo ^ linear subspace of 
5p, a hyperplane of dimension p that contains the multiples of Ip. 
(A2) Test of unordered eigenvectors with unknown eigenvalues. This is an- 
other test that falls in the linear category. The test is : M £ Mjjg vs. 
M ^ Mjjg, where Mjjo is given by (2). This is a test of whether M has 
particular eigenvectors Uq in any order (i.e., the columns of Uq diag- 
onalize M) while the eigenvalues are treated as nuisance parameters. 
This is one of the cases that was considered by Mallows [22] . 

Next we consider one test involving a polyhedral convex cone. 

(C2) Test of ordered eigenvectors with unknown eigenvalues. This test serves 
the same purpose as test (A2), but here the eigenvectors are forced 
to be ordered, so that the first eigenvector corresponds to the largest 
eigenvalue, the second eigenvector to the second largest eigenvalue, and 
so on (ties allowed). The change in assumptions makes the problem 
nonlinear. Here the test is Hq : M G -M^^^ vs. M ^ -^^g > where 

(3) M^^ = {M = U^DU'^-.D^ Vp,di>---> dp}, 

Uq G Op is fixed and di> ■ ■ ■ > dp are the diagonal entries of D. The 
parameter set -A^^^ is a closed polyhedral convex cone determined by 
the constraints di> ■ ■ ■ >dp and rotated by the matrix Uq. 

The last three tests involve orthogonally invariant embedded submanifolds 
of Sp. 

(SI) Test of ordered eigenvectors with known eigenvalues. This test serves 
the same purpose as test (A2), but here the eigenvalues are treated as 
fixed from the atlas. The change in assumptions makes the problem 
nonlinear. The test is Hq : M = Mq = UqDqUq vs. M £ Mdq, where 

(4) MDo = {M = UDoU':U£Op} 

for fixed Dq £ Dp with diagonal entries di > ■ ■ ■ > dp. If the diagonal 
entries of Dq are distinct, this is equivalent to testing whether M has 
eigenvector matrix Uq G Op (up to sign flips of the columns oiUo), while 
the eigenvalues di> ■ ■ ■ > dp oi M are known and fixed. In general, the 
set M Do is the set of matrices with fixed k <p distinct eigenvalues di > 
■ ■ ■ > dk and corresponding multiplicities nii, . . . , so that J2i=i nii = 
p. This is a closed embedded submanifold of Sp that is invariant under 
transformations QMQ' for Q £ Op. Note that its dimension depends 
on the multiplicities mi, . . . ,mfc. For example, if the eigenvalues have 
no multiplicities {k = p), then M.Do is diffeomorphic to Op and has 
dimension p{p — l)/2, but if all the eigenvalues are equal {k = 1, mi = 
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p), then M^Dq reduces to the single point {dilp}. In the case p = 2 
with di^ d2, the set is exactly the same as the parameter set 

in Fisher's circle problem of a bivariate normal with mean parameter 
constrained to a circle of fixed radius (Efron [11]). 

(52) Test of eigenvalues with unknown eigenvectors. This test serves the 
same purpose as test (Al), except that the eigenvectors are treated 
as nuisance parameters. Again the change in assumptions makes the 
problem nonlinear. Here the test is Hq :M £ vs. Ha ■ M ^ Mijg, 
where Mdo is given by (4). This is equivalent to testing whether M 
has a set of eigenvalues equal to those in Dq , where the set may include 
repeats depending on the multiplicities. Because the eigenvectors are 
not specified, the null includes reorderings of the eigenvalues. 

(53) Test of eigenvalue multiplicities. This is a test of whether M has 
eigenvalues with specific multiplicities against the eigenvalues being 
all distinct. This is useful in DTI to identify tissue types since, in 
theory, tensors in the cerebral fluid and the gray matter are isotropic 
(di = d2 = d^), tensors in single fibers are prolate {di > d2 = d^) and 
tensors in fiber crossings may be oblate {di = d2 > d-s). The test is 
M e Mrni,...,mk VS. M ^ Mrni,...,mk , where 

(5) Mrm,...,mk = {M = UDU' -.U eOp,D £Vp, mult, mi, . . . , mfc} 

is the set of matrices with unspecified k <p distinct eigenvalues di > 
■ ■ ■ > dk and corresponding multiplicities mi, . . . , so that X]i=i "^i = 
p. This is an orthogonally invariant embedded submanifold of Sp whose 
dimension depends on the multiplicities. For example, if di = ■ ■ ■ = dp 
[k = 1, mi = p) we get the closed straight line Tp = {alp, a G M} of 
dimension 1, but \i di ^ ■ ■ ■ ^ dp {k = p) then we get the open set 
Sp \ Ip of dimension p. 

The specific cases above are worked out in detail in Section 4. Before 
proceeding, it is helpful to visualize the above parameter sets in the case 
p = 2. 

Example 2.1 {p = 2). For X e ^2, define the embedding of ^2 in 
by the operator {x,y,z) = vecd(X) = (Xii,X22,\/2-^i2). In Figure 2, the 
set of diagonal matrices vecd(L>) = {di,d2,0) corresponds to the xy-plane, 
while the set of matrices with equal eigenvalues di = d2 are multiples of the 
identity matrix and map to the line X2 = {x = y, z = 0}. For a generic M, 
consider the eigendecomposition 

(6) M = UDU'=(''"^, "'""i)- 

\ sm 6' cos J \ d2 J \ — sm8 cos J 
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Then M gets mapped to {x,y,z) =vecd(M), where 

x={di+ d2)/2 + [{di - d2)/2] cos 29, 

y = {di+ d2)/2 - [{di - d2)/2] cos 29, 

z = V2[{di-d2)/2\ sin 2(9. 

The effect of ?7 is a rotation around the axis Z2 by an angle 26. In Figure 
2, the circle represents the rotation trajectory of the point (3,1,0) on the 
plane x + y = A with center at (2, 2). 

The set M.Uq defined by (2) has fixed 9 and unrestricted di and ^2, so 
it maps to a plane through the line I2 making an angle 29 with the xy 
plane. Notice that replacing by 9 + tt results in the same plane, as the 
eigendecomposition is invariant under sign changes of the eigenvectors. If the 
eigenvalues are constrained so that di > ^2, then we get the set A^^^ defined 
by (3). This is half the previous plane, a closed convex cone that is not an 
embedded submanifold of ^2 . The set M. Dq defined by (4) has fixed di and ^2 
and allows 9 to vary. When di 7^ d2 it maps to a circle of radius ^/2{dl —d2)/2 
orthogonal to the line X2 and centered at ((di +(i2)/2, (di + (i2)/2,0), shown 
in Figure 2 for di = 3 and c?2 = 1 . This is a closed embedded submanifold of 
S2 that is invariant under orthogonal transformations QMQ' for Q £ 0{2). 
When projected on the plane of the circle, this parameter set is exactly the 
same as the parameter set in Fisher's circle problem of a bivariate normal 
with mean parameter constrained to a circle of fixed radius (Efron [11]). 
The dimension of this set depends on the multiplicities of the eigenvalues. 
If di = d2, then the circle collapses to a single point. 

2.2. Two-sample problems. The two-sample problem in DTI data is use- 
ful when comparing two groups, such as in case-control studies. As in the 




Fig. 2. Examples of orthogonally invariant subsets of S2 embedded in K^. 
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Fig. 3. Test and parameter sets in the two-sample problem. Each test is represented by 
an arrow, where the origin of the arrow indicates the null hypothesis and the end of the 
arrow indicates the alternative hypothesis. The dashed borders indicate sets for which the 
MLE has no closed-form. The orthogonal and parallel notation is explained in Definition 
3.2, Section 3.3. 



one-sample case, there are many combinations of hypotheses. Here we con- 
sider nontrivial situations where some parameters are common and others 
are not: the case where M\ and M2 have common eigenvalues and the case 
where Mi and M2 have common eigenvectors. The cases considered here are 
summarized in Figure 3. Of the five tests, only the first is linear. The rest 
are nonlinear. All the sets involved satisfy the conditions of Theorem 3.2 
below, so that the MLEs of Mi and M2 do not depend on the covariance 
parameters when the covariance structure is orthogonally invariant. 

(AO) Unrestricted test. Here no particular attention is paid to the eigen- 
structure. The test is Hq:Mi= M2 vs. Ha:Mi^ M2. 

(SI) Test of equality of eigenvalues with unrestricted eigenvectors. This is 
a test of whether the means of two populations have the same eigen- 
values, treating the eigenvectors as nuisance parameters. In DTI, this 
is useful for determining, for instance, if the two populations corre- 
spond to the same tissue type. The test is Hq : (Mi,M2) G M.2,d vs. 
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Ha ■■ (Ml, M2) i M2,D, where 



(7) 



M2,D = {(Ml, M2) : Ml = UiDU[,M2 = U2DU^} 



for unspecified D GVp and Ui,U2 ^ Op. D is assumed to have un- 
specified k distinct eigenvalues di > ■ ■ ■ > dk with fixed and known 
multiphcities mi, . . . , mfc. This is an embedded submanifold that is in- 
variant under transformations QiMiQ'i and Q2M2Q2 for Qi,Q2 S Op- 
Its dimension depends on the multiphcities. 

(52) Test of equality of eigenvectors with common eigenvalues. This is a test 
of whether the means of two populations have the same eigenvectors, 
when the eigenvalues are treated as nuisance parameters and assumed 
equal between the two populations. In DTI, this is useful for determin- 
ing, for instance, if two populations of the same tissue type have neural 
fibers oriented in the same direction in space. The test is Hq : Mi = M2 
vs. Ha '■ (Ml, M2) E M.2,D, where M.2,D is given by (7). Here again the 
dimension depends on the multiplicities of the eigenvalues in D, which 
are assumed to be known. 

(53) Test of equality of eigenvalues with common eigenvectors. This test 
serves the same purpose as test (SI) of testing equality of eigenvalues 
while treating the eigenvectors as nuisance parameters, with the added 
restriction that the eigenvectors are common to both populations. The 
test is Ho : Mi = M2 vs. Ha ■ (Mi, M2) G M2,u, where 



for unspecified Di,D2 G T>p and U G Op. Again, this is an embed- 
ded submanifold that is invariant under transformations QMiQ' and 
QM2Q' for Q G Op. Its dimension depends on the multiplicities of the 
eigenvalues in Di and D2. 
(S4) Test of equality of eigenvectors with unrestricted eigenvalues. This test 
serves the same purpose as test (S2) of testing equality of eigenvec- 
tors, when the eigenvalues are treated as nuisance parameters, but the 
assumption of common eigenvalues has been removed. Here the test is 
Hq : (Mi,M2) G M2,u vs. Ha : (Mi,M2) ^ M2,u, where M2,u is given 



The set Ai2,u involved in cases (S3) and (S4) is a difficult case for which 
the MLE does not have a closed-form. For this reason, below we focus on the 
solutions to cases (SI) and (S2). These are worked out in detail in Section 



(8) 



M2,u = {(Mi,M2) :Mi = UDiU',M2 = UD2U'} 



by (8). 



5. 



12 A. SCHWARTZMAN, W. F. MASCARENHAS AND J. E. TAYLOR 



3. General inference for Gaussian symmetric matrices. 

3.1. Covariance structures. Consider the embedding of Sp m. , q = 
p{p + l)/2, p>2, by the operator vecd(y) = (diag(y)', V2ofFdiag(y)')', 
where diag(y) is a p x 1 column vector containing the diagonal entries of 
Y £ Sp and offdiag(y) is a (q — p) x 1 column vector containing the off- 
diagonal entries of 1". A random matrix y G 5p is nondegenerate Gaussian 
if and only if it has density 

with respect to Lebesgue measure on M"?, where Ti £ Sq is positive definite 
and II • III; is the norm corresponding to the inner product in Sp 

(10) {A, B)s = (vecd(yl))'S-^ vecd(S). 

The density (9), which we denote Npp(M,Y^), has mean parameter M = 
E{Y) G Sp and covariance parameter S = cov(vecd(y)). Notice that ||y — 
M|||~x'(g). 

The covariance structure of Z = Y — M in (9) is called orthogonally 
invariant if the distribution of Z is the same as that of QZQ' for any 
Q G Op. Let Ip = (1, . . . , 1)' be the p-vector of ones. Mallows [22] showed 
that Z has orthogonally invariant covariance if and only if cov(diag(Z)) = 
cr^(/p + clpl'p) for some and c, and cov(offdiag(Z)) = ((T^/2)/g_p, inde- 
pendent of diag(Z). For p = 3, the orthogonally invariant covariance T, is 
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Positive definiteness of S requires that > and c > —l/p. In particular 
when c>0,Z may be constructed as Z = a{^/cIpW + W), where w ~ A^(0, 1) 
and W £Sp has the GOE distribution, that is, W has independent diagonal 
entries A^(0,1) and independent off-diagonal entries A^(0, 1/2). 

For convenience, we define r = c/(l + pc) < l/p. The distribution of y = 
M + Z, which we call the orthogonally invariant symmetric-matrix- variate 
normal distribution, and denote Npp{M , a'^ , t) , has density 

where the inner product (10) defining the norm || • ||^2 ^ has the explicit form 
(12) {A,B)^2^, = [tT{AB)-TtT{A)tT{B)]/a\ 
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Notice that ||^||io = tr(A^) is the standard Frobenius norm of symmetric 
matrices. In the density (11), the mean parameter is M S Sp, while the co- 
variance is captured by the two scalar parameters o"^ > and r e (—00, 1/p). 
In this notation, the GOE distribution is the same as Npp{0, 1,0) and may 
be thought of as a standard normal for symmetric matrices. In general when 
r = 0, we call the covariance structure spherical. Later we use the following 
two obvious properties derived from the orthogonal invariance: 

(1) If y~iVpp(M,cj2,r), then QYQ' Npp{QMQ' ,t) for all Q G Op. 

(2) {A, B),2^, = {QAQ', QSg')a2,r for all Q £ Op. 

Recall from Example 2.1 that when p = 2, the effect of the operation QZQ' 
for Q € 0(2) is a rotation of vecd(Z) around the axis T2. The orthogonal 
invariance of the distribution of Z implies that the contours of the density 
N22{M,a'^,T) are ellipsoids that are circularly symmetric around an axis 
that is parallel to l2- Similarly for general p, the contours of (11) are hyper- 
ellipsoids that are spherically symmetric around an axis that is parallel to 
the line Ip = {alp, a G M}. The cross sections to that axis are hyperspheres 
of dimension q — 1. The parameter r only affects the size of the unequal axis, 
where the ratio between the length of the unequal axis and the diameter of 
the hyperspheres is y/1 + pc = 1/ \/l — pr. An important consequence is that 
r only affects the variability parallel to the 2p axis, whereas the variability 
in the space orthogonal to Ip is the same as if the covariance structure were 
spherical. The orthogonally invariant covariance is the only one with this 
property. This is stated formally in Lemma 3.1 below. Notice that, by (12), 
a vector A £ Sp is orthogonal to Ip, that is, {A,Ip)^2^^ = 0, if and only if 
tr{A) = 0. 

Lemma 3.1. T, is orthogonally invariant if and only if there exists 
such that, for all matrices A,B £ Sp with ti{A) = 0, {A, B)y, = {A, B)^2 q. 

Proof, (i) Suppose S is orthogonally invariant, so that (•, = (•, ■)d'^,T 
for some a^,r. Let A be any matrix such that tr(yl) = 0. For any B G Sp, 

{A, B)-2^, = MAB) - T tT{A) tv{B)]/a^ = iT{AB)/d^ = {A, B)^2^o 

and we can take a'^ = a'^. 

(ii) Conversely, suppose that there exists cr^ such that {A, S)s = {A, B)^2 q 
for any pair of matrices A,B such that tr(^) = 0. Define the bilinear form 
h{X,W) = {X,W)t. - {X,W)„2^Q. By (12), it is sufficient to show that 
h{X, W) = rtr(X) tr(VK)/(T^ for some r. By assumption, h{X, W) = when- 
ever tr(X) = or tr(VF) = 0. Thus, by bilinearity, 

h{X, W) = h[X - tr{X)Ip/p + ti{X)Ip/p, W - tr{W)Ip/p + tiiW)Ip/p] 
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= h[X - tv{X)Ip/p, W - tr{W)Ip/p] + tv{X)h[Ip/p, W - tT{W)Ip/p] 
+ tv{W)h[X - tT{X)Ip/p, Ip/p] + ti{X) ti{W)h{Ip/p, Ip/p) 

= iT{X)iT{W)h{Ip,Ip)/p\ 

Therefore, S is orthogonally invariant with parameters <t^ and r = a^h{Ip, Ip) / 

P^ = <7^Ip\\l/p'^ -l/p. □ 

3.2. General inference in the one-sample problem. Let Yi,...,Yn i.i.d. 
Npp{M, S). Given T,,Y = X^iLi Yi/n is a sufficient and complete statistic for 
M. For future reference, the classical recipe for finding the MLE of M over 
generic subsets of Sp is stated in the following lemma. 

Lemma 3.2. Suppose M lies in a nonempty closed subset yVJ of Sp. 
Given E, the MLE of M over M, denoted M, minimizes the squared Ma- 
halanobis distance 

(13) g^{M) = \\Y-M\\l 

over M and is unique almost surely. M is consistent as oo. 

Proof. Ignoring the constant 27r, the log-likelihood from model (9) can 
be written as 

71/ Tt — 

(14) /(M,S) = --log |S| - -{si + ||y - Mill), 

where s| = {^/'n)Y^^=i \\Yi — Y\\1 and we have used the sum-of-squares de- 
composition 

n n 

^ ||y, - Mill = ^ ||y, - y III + n||y - m|||. 

i=l 1=1 

Thus, given S, maximizing the likelihood over M is equivalent to minimiz- 
ing (13) over M. The existence, almost sure uniqueness and consistency of 
the MLE is guaranteed under the assumption that the set M is closed (Wald 
[32]). □ 

Lemma 3.2 says that M can be found as the orthogonal projection of 
y on according to the inner product {A,B)j]. In general, M depends 
on S. However, if S is orthogonally invariant, then for a particular class of 
sets Ai given by Definition 3.1 below, the projection M does not depend on 
S and the problem is equivalent to finding the MLE when the covariance 
structure is spherical. This property is instrumental in the derivation of 
closed-form MLEs for the eigenvalue and eigenvector problems of Sections 4 
and 5. Conversely, if we require the irrelevance of S for finding the MLE of 
M in that class of sets, then S must be orthogonally invariant. This result 
is stated in Theorem 3.1 below. 
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Definition 3.1. Define the straight line Ip = {alp, a G M} and let "©" 
denote Minkowski addition. We say that a set M is: 

(a) Orthogonal to Zp, if it can be written as M = A® {o-Ip} for some 
a E M and some AdSp such that tr(^) = for all A. 

(b) Parallel to 2p, if it can be written as, ^A = A®Tp, where A is orthog- 
onal to Tp. 

The definition of orthogonality above is equivalent to the condition that 
for every tangent vector AtoM., {A,Ip)^2^^ = 0. For an orthogonal set M., 
tr(M) = ap is constant for all M S A^. Both properties defined in Definition 
3.1 are orthogonally invariant: if Ai is orthogonal (parallel) to Ip, so is the 
set QMQ' = {QMQ',M £ M}, where Q G Op. It is easy to check that each 
of the sets in Figure 1 belongs to one of the above categories: the single 
point {Mq} and the set M Dq orthogonal to 2^; the rest are parallel to 

T 

Mp. 

Theorem 3.1. T, is orthogonally invariant if and only if, for every em- 
bedded submanifold M <zSp that is either orthogonal or parallel to Ip, every 
critical point M of gT;{M) = \\Y — M|||. over is also a critical point of 
5i,o(M) = \\Y - M\\lo = tr[(y - Mf] . 

Proof, (i) Suppose S is orthogonally invariant, so (•,•)£ = {'■,') (j'^,t foi' 
some (T^,T. Let A denote the tangent space to M. at M. For every critical 
point M of g^2 T-{M) = \\Y — M\\'^2 ^, B = Y — M is orthogonal to all tangent 
vectors Ag A, that is, {A,B)^2^^ = 0. We need to show that every tangent 
vector A also satisfies the spherical orthogonality condition {A, B)^2^o = 
tr:{AB) = for every r. By (12), this is achieved if either tr(^) = or tr{B) = 
0. 

If M is orthogonal to Ip, then ti{A) = 0, and the result follows from 
Lemma 3.1. Assume instead M is parallel to Ip. Decompose every tan- 
gent vector A £ A as a sum of a multiple of the identity tr{A)Ip/p and an 
orthogonal tangent vector A — tr(A)Ip/p. Lemma 3.1 takes care of the or- 
thogonal vectors. As for the remaining direction parallel to Ip, note that 
= {Ip,B)^2^^ = tr{B){l — pt)/(t'^, which implies that tr(i?) = for any 

critical point M for the inner product {■,-)„2^^. Therefore, {Ip,B)^2^Q = 
tr(S)/f72 =0. 

(ii) Conversely, suppose now that for every Y £ Sp and critical point 
Ad £ ^A with respect to the inner product (•,•)£, M is also a critical point 
with respect to the inner product (•, •)i,o- Further, this holds for all sets that 
are orthogonal or parallel to Ip. In particular, it holds for every orthogonal 
affine subspace of Sp. Following a simple regression argument this implies 
that if we restrict our attention to the largest orthogonal subspace 

^max = {^G5p:tr(^)=0}, 
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then (•,•)£ agrees with (•,-)o-2,o ^ov some cr^. 

Consider the case when the minimization problem is projection onto the 
subspace ^max- As the problem is simply a linear regression, there is a 
unique critical point for either {■,■)'£ or (•,-)i,o and, further, the critical 
points are identical by assumption. It is not hard to see that the critical 
point is M = Y — Yip/ p. The normal equations for this problem are 

= (y - M, A)s = ^ (/, A)s = ^ (/, A)i,o 

p p 

and these must hold for all Y and all A E ^max- In other words, (/p, A)i,o = 
(Ip, A)^ = 0, for all A G ^^ax- Define W) = {X, W)s - {X, W)^2^o where 
(7^ is described above. We have to show that h{X,Y) = r tr(X) tr(VK) for 
some T. The proof proceeds similarly to the final display in the proof of 
Lemma 3.1. □ 



By Theorem 3.1, if we assume the orthogonally invariant model (11), then 
within the class of sets M that are either orthogonal or parallel to Ip, the 
problem of finding the MLE of M reduces to minimizing tr[(y — M)^] over 
Ad. Once M is found, the estimates of a and r in model (11) easily follow 
and are given by the following lemma. 

Lemma 3.3. Let yi,...,y„ i.i.d. Npp{M,a'^ ,t), p>2. Suppose M lies 
in a nonempty closed subset M of Sp and denote by M the MLE of M . 
Given r, if we define 

I - 1 " _ 

(15) al = sl + -\\Y-M\\l,, 4 = _^||y,_y||2 

q gn^ 

then (M,(T^) is the MLE of {M,a'^) in M x M+. Moreover, if we define 

(g - l)Er=i[tr(y. - y)]2 + n[tr(y - M)]2] ' 

then (M,6-?,f) is the MLE of {M,a'^ ,t) in M xR+ x {-oo,l/p). The above 
estimates are unique almost surely and are consistent as n— > oo. 

Proof. Under model (11), the log-likelihood (14) becomes 

(17) 1{M, r) = log + ^ log(l - pr) - ^ + ^11^" ^Wlr) ■ 

The MLEs for cj^ and r are obtained by differentiating (17) and setting the 
derivative equal to zero. It is easy to check that f < 1/p. Since M — > M as 
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n — > oo, 



Similarly, 



[{q-l)/n)Y.l=iMyi-M)f ' (g-l)£;[tr(y-M)]2- 

The numerator is - Mf{^^i^ = tr[(/p + ll'c)(/p - ll'q/p)]a'^ + {q - 

p)a'^ = {1 — q)pca^ , where c = t/(1— pr). The denominator is (g — l)£'[tr(y — 
M)]2 = {q- 1)1' (/p + ll'c)la^ = iq- l){p + p^c)a^. Therefore f ^ c/(l + 

pc) = T. □ 

The distribution of the variance estimate (15) for known r is established 
by noting that n\\Y — M||^2 ^ — > X^iq ~ k) as n — > oo when is a closed 
embedded submanifold of Sp of dimension k, and exactly so for all finite n 
if Ai is an affine subspace. Multiplying (15) by the appropriate constants 
gives that, given r, qna'^/a'^ is the sum of two independent terms and thus 
has an exact or approximate distribution with q{n — 1) + q — k = qn — k 
degrees of freedom, depending on whether the distribution of the second 
term is exact or approximate for large n. 

In the testing situation, let Yi, . . . , K„ G 5p i.i.d. Npp{M, S) and consider 
the test Hq-.M <^ Mq vs. Ha : M G MIaj where A^o C Ml a ^ Sp are closed 
embedded submanifolds with dimensions kA = dim(Al^), /cq = dim(A^o)) 
kA> ko- Let Mq and Ma denote MLEs of M under Hq and Ha, respectively, 
with corresponding maximized log-likelihoods Iq and Ia- 

Lemma 3.4. Suppose A4q C Ma ore closed embedded submanifolds of 
Sp. Given Ti, the LLR test statistic 2{Ia — Iq) is 

(18a) T = n\\Y - MoWl - n\\Y - MaWI 

(18b) = 2n{Y,MA - Moh + ^l|Mo||| - n\\MA\\l 

and is distributed as x^i^A — ^o) under Mq asymptotically as n ^ oo. // 
A^o C Ma both affine subspaces of Sp, then the distribution is exactly 
X^{kA — ko) for all finite n. 

If S is orthogonally invariant, then given a and r, the LLR test statistic 
is the same as (18a) or (18b) with the norm defined by (12). As usual, when 
cr^ and r are unknown, they can be replaced by the consistent estimators of 
Lemma 3.3. 



qn 



\Yi-M\\l 



-E\\Y - M\\l^ = a'^. 



(i/^)Er=ir.-M||2 



^r-^iiL/P 
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Example 3.1 (Unrestricted). The whole space M = Sp is triviahy an 
affine subspace of itself and is parallel to Ip. Here M = Y, a"^ = s\ and 

Notice that the closer Y]l=i^A^i - is to YJt=i^^^i - >^)]Vp, the 

closer is r to 0, suggesting sphericity of the distribution. Here we have Y ~ 
Npp{M,a'^ /n,T), and given r, qusl/a"^ ~ X^('?(^ ~ 1)) independent of y. 

The LLR test statistic for testing Hq-.M = Mq vs. Ha:M ^ Mq unre- 
stricted is given immediately by Lemma 3.4. Since Mq = Mq and Ma = Y , 
the LLR given a and r is T = n||y — Mo|p2 ~ X^io) under Hq, and is 
independent of r. If a and r are unknown, the LLR is an increasing func- 
tion of the statistic T = {n — 1)\\Y — Mq\\\ -/{qsi), which is approximately 
F{q,q{n — 1)) under Hq for large n. Both test statistics above reduce in the 
univariate case {p= 1, r = 0) to the squares of the known one-sample z and 
t statistics. 



In addition to the above tests, one may want to test whether the orthog- 
onally invariant covariance structure is an appropriate model for the data. 
This can be done using the LLR test given by the following proposition. 

Proposition 3.1. Let Yi, . . . ,Yn £ Sp i.i.d. Npp{M,T,), n> q{q + 3)/2, 
and consider the test Hq : T, is orthogonally invariant with r > vs. Ha '■ S 
is unrestricted. The LLR statistic 2{Ia — Iq) for this test is 

(20) T = nq log (7^ _ n log(l - pf) - n log | S | _ ^ 

where and f are given by (15) and (16), respectively, and T, is the em- 
pirical covariance matrix S = (1/?^) X^iLi "^ecd(li — y)[vecd(li — Y)]' . 

Proof. Replacing S in (14) gives the maximized likelihood under Ha, 
Ia = —{n/2) log |S| — nq/2 with dimension q + q{q + l)/2. Replacing a'^ and 
f in (17) gives the maximized likelihood under i?o, = — (ng/2) loga^ -|- 
(n/2) log(l — pf) — nq/2 with dimension q + 2. The LLR 2{Ia — h) is equal 
to (20) and the difference of dimensions is q{q + l)/2 — 2. □ 

We emphasize that the asymptotic distribution in (20) is guaranteed only 
if r < (i.e., c> 0). A more general situation of this kind is treated by Drton 
[8], Section 6. In that context, the set of orthogonally invariant covariances 
S with T < can be seen as a special case of a factor analysis model. The 
spherical case r = is at the boundary of that semi-algebraic set and the 
asymptotic distribution of the LLR there is not x^. 
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3.3. General inference in the two-sample problem. Let Yi, . . . ,Yni and 
Yni+i, . . . , 1^, n = ni + 712, be two independent i.i.d. samples from Npp{Mi, S) 
and iVpp(M2,S), respectively. For simplicity, we assume the covariance S is 
common to both samples, although this assumption can be relaxed in similar 
ways to the proposed solutions to the multivariate Behrens-Fisher problem 
(Scheffe [29], Mardia, Kent and Bibby [23]). Here we are interested in esti- 
mating and testing the pair M = (Mi,M2) when it is restricted to a subset 
A42 of Sp X Sp defined in terms of eigenvalues and eigenvectors of Mi and 

M2. 

To solve two-sample problems it is convenient to define the following inner 
product in Sp x Sp. For A = (j4i, A2) and B = (i?i,i?2), let 

(21a) {{A, = ni{AuBi)s + 712(^2, 

(21b) =n(avg(A),avg(i?))s + ^(A(A),A(i?))s, 

n 

where we have defined for a generic X = {Xi,X2) the linear functions 

, , niXi + 712X2 

avg(X) = , A{X)=Xi-X2. 

n 

Define the gjoup averages Yi = {l/ni)J2i=i ^^'^ ^2 = (1/'T'2) SILni+i 
and let Y = (Yi,l2)- Writing the joint likelihood of the sample and pooling 
terms leads to the MLE of M = (Mi, M2) for general sets M2 cSp x Sp as 
given by the following lemma. 

Lemma 3.5. Suppose the pair M = (Mi,M2) lies in a nonempty closed 
subset M2 of Sp X Sp. Then, given S, the MLE M = (Mi,M2) of M is 
unique almost surely and minimizes over M2 the squared distance 

(22a) 5s(Mi,M2)=ni||yi-Mi|||+n2||y2-M2||| 

(22b) = 7711 avg(y) - avg(M)||| + ^||A(y) - A(M)|||. 

n 

The MLE M is consistent as 711,772 —> 00. 

As in the one-sample case, we show that for a class of sets M2 C 5p x Sp, 
the MLE does not depend on S if and only if S is orthogonally invariant. 

Definition 3.2. Let Zp x 2p = {(Mi, M2) = (a/p,/3/p),a,/3 G M} and 
{Ip,Ip) = {(Mi,M2) = j{Ip,Ip),-f£ R}cIpX Ip. We say that a set M2 C 
Sp X Sp is! 

(a) orthogonal to Tp x Ip, if M.2 can be written as A^2 = A(B {{alp, bLp)} 
for some a, 6 G M and A C Sp x Sp such that tr(Ai) = tr(A2) = for all 
(^1,^2) G A 
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(b) parallel to {Ip,Ip), if it can be written as Ai2 = A(B (2p,2p), where 
A is orthogonal to Ip x Ip in the sense of definition (a). 

(c) parallel to Ip x Ip, if it can be written as A^2 = A(B {Ip x Ip), where 
A is orthogonal to Ip x Ip in the sense of definition (a). 

The definition of orthogonality Definition 3.2(a) is equivalent to the condi- 
tion that for every tangent vector A = (^i, ^2) to A42, {{A, {alp, PIp)))a^^r = 
for all a, /3 G ]R. This can be checked easily replacing in (21b): setting a = P 
gives tr(avg(A)) = and using a = —n2/ni(3 gives tr(A(A)) = 0. All three 
properties defined in Definition 3.2 are orthogonally invariant: if M2 belongs 
to one of the three categories, the set {{QiMiQ'^,Q2M2Q'2)-,{Mi,M2) G 
A^2}, where Qi,Q2 G Op, also belongs to the same category as J^2- It is 
easy to check that each of the sets in Figure 3 belongs to one of the cate- 
gories in Definition 3.2(b) or 3.2(c): the affine subset {Mi = M2} and the 
set M2^D are parallel to {Ip,Ip); the other two are parallel to Ip x Ip. 

Theorem 3.2. S is orthogonally invariant if and only if, for every em- 
bedded suhmanifold M.2 C SpXSp that is either parallel to {Ip,Ip) or parallel 
to Ip X Ip, every critical point M = (Mi,M2) of gT.{M) = ni\\Yi — Mi||| -|- 
^^211^ — -^2|ls over A^2 is also a critical point of gifi{M) = nitr[(Yi — 
Mi)2]+n2tr[(y2-M2)2]. 

Proof, (i) Suppose S is orthogonally invariant and let B = Y — M € 
Sp X Sp. Every critical point M of gs{M) must satisfy the orthogonality 
condition {{A, B))^2^^ = for every tangent vector A = {Ai,A2) to M.2 at 
M. Using (12) and (21b), the orthogonality condition can be written as 



(23) 



ntr[avg(A)avg(S)l + ^ tr[A(^)A(S)] 

n 

ntr[avg(^)]tr[avg(S)] + ^ tr[A(^)] tr[A(S)] 

n 



0. 



M is a critical point of gifi{M) only if the first bracket in (23) is zero. We 
thus need to show that the second bracket is equal to zero for all r and all 
tangent vectors A. 

Assume A^2 is parallel to {Ip,Ip). The tangent space to A^2 at M can be 
written as ^© (Ip,Ip) where every tangent vector A = (^1, yl2) G A satisfies 
tr(Ai) = tr(^2) = 0. For every A^A, tr(avg(^)) = tr(A(A)) = and we are 
done. For the remaining component parallel to (Ip,Ip), let A = a{Ip,Ip) so 
that avg(^) = alp and A(^) = 0. Replacing in (23), we see that tr[avg(i?)] = 
for any critical point with respect to ((•, •))ct2 ,-. Since A(j4) = 0, the second 
bracket in (23) vanishes for all r. 
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Assume instead A^2 is parallel to 2p x Xp. The tangent space to A^2 
at M can be written as ^ © (2p,2p) © {n2lp,—nilp) where A is orthog- 
onal to Tp X Tp and (n22p, —riiTp) = {(Mi,M2) = 7(n2/p, —nilp),-) G M} C 
Xp X 2p. The first two components were covered in the previous case. For 
every tangent vector A in the component parallel to {n2lp, —ni2p) , let 
A = a{n2lp, —nilp), so that avg{A) = and A{A) = anip. Replacing in (23), 
we see that tr[A(i?)] = for any critical point with respect to ((•, Since 
avg(A) = 0, the second bracket in (23) vanishes for all r. 

(ii) The proof is similar to the proof in the one-sample case (Theorem 
3.1). □ 

By Theorem 3.2, if we assume the orthogonally invariant model (11), then 
within the class of sets M2 that are either parallel to (Ip,2p) or parallel to 
Ip X Ip, the problem of finding the MLE of A-I reduces to minimizing gi^Q{M) 
over A^2- Once M is found, the estimates of a and r in model (11) easily 
follow as shown in the following lemma. 

Lemma 3.6. Let Yi, . . . ,Yn-^ and l^^+i, . . . ,Yn, n = ni + n2, be two inde- 
pendent i.i.d. samples from Npp{Mi,a'^,T) and Npp{M2,(T'^,T), respectively, 
p>2. Suppose the pair M = (Mi, M2) lies in a nonempty closed subset M.2 
ofSp X Sp and denote by M = (Mi,M2) the MLE of M . Then, given M and 
T, the MLE of is given by 

(24) ^ ^ ^(^ill^i - ^lll'^r + "211^2 - M2||'2,J, 

where 

1 / ni n \ 

'i'^ \i=\ i=ni+l / 

is the pooled variance. Moreover, given M , the MLE of r is 

Er=i \\y. - ^^z{y)\\l,/p + - m wj./p + n2\\Y2 - mj^/p 

(q- l){Er=i[tr(^i - avg(y))]2 + ni[tr(yi - Mi)2] + ni[tr(y2 - M2W} ' 
The above estimates are unique and are consistent as ni , ?i2 —> 00 . 

In the testing situation, let Yi, . . . , Y^ and ym+ij ■ ■ ■ ,Yn, n = ui + n2, 
be two independent i.i.d. samples from Npp{Mi,T,) and iVpp(M2,S), respec- 
tively. Consider the test i^o : (Mi, M2) G A^o vs. i^o : (Mi, M2) G Ma, where 
A^o C Ma ^ Sp X Sp are closed embedded submanifolds with dimensions 
kA = dim(A^^), ko = dim(A^o), ^a > ko. Let Mi,o, ^2,0 denote the MLEs 
of Ml and M2 under Hq, and Mi^^, M2,yi the corresponding MLEs under 
Ha- 
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Lemma 3.7. Suppose Mq C Ma cLf^ close embedded submanifolds of 
Sp X Sp. Given S, the LLR test statistic 2{Ia — h) is 

T = m 1 1 Yi - Ml ,0 1 ll + na 1 1 F2 - M2,o I ll 

(25) 

- nillYi - Mi,a||| - n2||?2 - ^2,^111 

and is distributed as x^i^^A — ^0) under Mq asymptotically as ni,n2 — > 00. 
If Mo C Ma o,re both affine subspaces of Sp x Sp, then the distribution is 
exactly x^i^A — ko) for all finite ui and n2- 

If T, is orthogonally invariant, then given a and r, the LLR test statistic 
2{Ia — Iq) is the same as (25) with the norm defined by (12). As in the one- 
sample case, if o"^ and r are unknown, then they can be replaced by the 
consistent estimators of Lemma 3.6. 

Example 3.2 (Unrestricted). If M2 = Sp x Sp unrestricted, the MLEs 
are Mi = Yi, M2 = Y2, cr"^ = S12 and f is given by (19). Yi and Y2 are 
independent Npp{Mi,a'^ /n,T) and Npp{M2,cr'^ /n,T), respectively, and are 
independent of S12 given r. In addition, given r, gnsfg/c^ ~ X^ilin- ~ 2)). 

The LLR test statistic for testing Hq : Mi = M2 vs. Ha '■ Mi / M2 is given 
by Lemma 3.7. Replacing M = Mi,o = M2,o = Y, Mi^a = Yi, and M2,A = Y2 
in (25), the test statistic given a and r is T= {nin2/n)\\Yi — 12||^2 ^ ~ X^{q) 
under Hq. If a and r are unknown, then the LLR is an increasing function 
of the statistic T = (n — 2)nin2\\Yi — Y2II1 r / i^''^'^ ^12) which is approximately 
F{q, q{n — 2)) under Hq for large ui and n2. Both test statistics above reduce 
in the univariate case (p = 1, r = 0) to the squares of the known two-sample 
z and t-statistics. 

4. Inference for eigenvalues and eigenvectors in the one-sample problem. 

For any of the sets described in Section 2.1 and Figure 1, M immediately 
specifies the MLE of o"^ and r by Lemma 3.3. In addition, by Lemma 3.4, 
all we need to know is the form of the test statistic (18a) or (18b) and 
corresponding test statistics for unknown a and r may be derived from 
there. Thus, for brevity, we assume from here on that a and r are known. 
To fix ideas, we begin with the linear cases and then move on to the nonlinear 
ones. 

4.1. Affine subspaces. The unrestricted case (AO) was covered in Exam- 
ple 3.1. Both cases (Al) and (A2) involve the same affine subspace Muo 
defined by (2). In order to derive the LLR statistics for both these cases we 
need the MLE oi M in Muo- 



EIGENVALUES AND EIGENVECTORS OF GAUSSIAN SYMMETRIC MATRICES23 



Proposition 4.1. Let Muo be given by (2). The MLE of M over Muq 
is M = UqDU^ where 

(26) D = diag(C/^yC/o). 

The diagonal elements of D are multivariate normal N {diag{D) , (Jp + clplpcr^/ 
n), where c = r/(l — pr). 

Proof. Because Mjjo is an affine subspace, the MLE of M is found by 
orthogonal projection. Mu^ is parallel to Tp. By Theorem 3.1, M = UqDUq, 
where t) is the minimizer of c/i,o(-D) = tr[(y - UoDU(^f] = tr[(C/|^yC/o - Df] ■ 
The solution is the diagonal matrix D closest in Frobenius norm to UqYUq, 
that is, (26). It is easy to see that D is unbiased. Also, because of the orthog- 
onal invariance property, UqYUq ~ Npp{UQMUo,a'^ /n,T) = Npp{D,a'^ /n,T), 
so diag(^) ~ iV(diag(L'), (Ip + clpl'p)a^ /n). □ 

The estimate D is invariant under sign changes of the eigenvectors and 
depends on the order of the columns of Uq only in the sense that permuting 
the columns of Uq will result in the same permutation of the columns of 
D. If P is a signed permutation matrix with elements +1 or —1 and the 
eigenvector matrix Uq is permuted to UqP, then diag{(UQPyY (UqP)) = 
diag{P'{Ul^YUo)P) = P' diag{U^YUo)P = P'DP. Notice that D is the same 
whether or not the true underlying eigenvalues D have multiplicities. 

In case (Al), the LLR test statistic for testing Hq : M = Mq vs. Ha '■ M G 
Muo is obtained from Lemma 3.4. Here Mq = UqDqUq and = UqDUq, 
where D = diag{UQYUo), so given a and r, the LLR test statistic is 

(27) T = n\\D-Do\\l2^,^x\p)- 

In case (A2), the test is Hq-.M £ Muo vs. M ^ Muo - Here Mq = UqDU^, 
D = diagiU^YUo), and Ma = Y , so given a and r, the LLR test statistic is 

(28) T = n\\Y-UobUi\\l^r;^x\q-p)- 

' Hq 

4.2. A convex polyhedral cone [Case (C2)J. Let -Mu^ be defined by (3). 
This set is parallel to Ip according to Definition 3.1, and Theorem 3.1 applies 
by checking each face for critical points. Define the vectors y = diag^U^YUQ) 
and d = diag(Z)). A similar argument as in the proof of Proposition 4.1 gives 
that the MLE of M is M = UqDUq, where the vector of diagonal elements 
d of D is the minimizer of \y — d\'^ constrained to di >■■■> dp. Many al- 
gorithms have been proposed for solving this type of problem, sometimes 
called order-restricted least squares, isotonic least squares or nonnegative 
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least squares (since the consecutive differences are nonnegative) (Lawson 
and Hanson [19], Dykstra [9]). A direct way to solve it in our case is to ap- 
ply the pool-adjacent-violators algorithm (PAVA) with equal initial weights 
(Robertson and Wegman [28], Raubertas [27]). 

If the true parameter D is such that di > ■ ■ ■ > dp, then y/n[d — d) is 
asymptotically multivariate normal A^(0,cr^(/p + clj,lp)). Otherwise, if d has 

multiplicities, the asymptotic distribution of ^/n{d — d) is the projection of 
A^(0, cj^(/p + clplp)) onto the support cone at the trued (Self and Liang [31]). 
For example, if p = 3 and the true parameter d is such that di = ^2 > c^3) 
then the support cone at d is the cone di > d2 (because the inequality d2 > ds 
asymptotically has no effect) . The asymptotic distribution of ^/n{d — d) is 
the projection of A^(0,(t^(/3 -fcl3l3)) onto the support cone at d: it has 
mass 1 /2 on the plane di = d2 and is normal on the half-space di > d2- 

Proposition 4.2 [Case (C2)]. Consider the test Hq : M G M(}^ vs. M ^ 
Mf}^, and let UqDUq he the MLE of M in Mfj^. Given a and t, the LLR 
statistic is T = n\\Y — UoDUq\\'^2 ^- If the true D has k <p distinct eigen- 
values, then the asymptotic distribution of T is a mixture of p — k + 1 x^- 
variables with degrees of freedom q— p,q— p+l,...,q — k. The weights of 
the mixture depend only on the angles of the cone at M = UqDUq and do 
not depend on a or t. 

Proof. The expression for T is obtained in a similar way to (28). Its 
asymptotic distribution follows from the theory of LLRs on convex cones 
(Self and Liang [31]). The weights of the mixture do not depend on a because 
o" is a scaling factor that cancels out in T. They also do not depend on r 
because Mf}^ is parallel to Ip and the relevant angles between the faces of 
the cone are computed as inner products on the space orthogonal to Ip. 
These inner products do not depend on r by Lemma 3.1. □ 

As specific cases, if D has p distinct eigenvalues, then T is asymptotically 
x'^{q~p)- The largest number of mixture components possible is p, obtained 
when D is isotropic. In this case we get a mixture of variables with degrees 
of freedom q—p, . . . , g — 1. As an example, take p = 3. If the true underlying d 
is oblate (di = ^2 > ^3), then the cone at d locally looks like a half space and 
the asymptotic distribution of T is (l/2)x3 + (l/2)x4. On the other hand, 
if D is isotropic (di = d2 = d^), the cone at d is framed by the intersection 
of two planes at an angle of 60°, so the asymptotic distribution of T is 

(l/6)xi + (l/2)xl + (l/3)xi. 

For general p, the mixture weights can be easily obtained by simulation. 
Given d, generate y ~ N{d, Ip) and project it onto the appropriate cone using 
the PAVA algorithm while keeping track of the dimension of the face that 
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y is projected onto. Repeating this many times, the weight corresponding 
to the \^{q — k) mixture component is the proportion of times that y gets 
projected onto a face of dimension k. 

4.3. Curved submanifolds [Cases (SI) and (S2)]. Both tests (SI) and 
(S2) involve the same submanifold Ai Dq ■ The dimension of the submanifold 
M-Do depends on the multiphcities of Dq and is given by the following 
lemma. The MLE of M in is then given by Theorem 4.1 below. 

Lemma 4.1. dim{MDo) = q- J2i=imi{'mi + l)/2. 

Proof. Mdq is diffeomorphic to the quotient Op/Omi x ••• x Orrik. 
This is because if Dq has diagonal blocks (ii/mi ) • ■ • j dklm^ ^'^d Q is block di- 
agonal orthogonal with orthogonal diagonal blocks Qi, . . . , Qk, then Q'DqQ = 
Dq. The set of such matrices Q is Onii x • • • x Omk- Therefore, 

dim(A^) = dim(Op) - dim(OK)) = ^^^^ _ i ^ ^.(^^ _ i). 

1=1 1=1 

Theorem 4.1. Let Mdq be given by (4) and let Y = VAV' be an eigen- 
decomposition chosen so that the diagonal elements of A are in decreasing 
order. 

(i) The MLE of M is M = UDqU' , where U is any matrix of the form 
(29) U = VQ, 

and Q is an orthogonal matrix such that QDqQ' = Dq, that is, block diagonal 
orthogonal with diagonal orthogonal blocks of sizes mi, . . . ,mfc. In particular, 
if the eigenvalues of Dq are distinct, then such Q 's are diagonal matrices 
with diagonal entries equal to ±1. 

(ii) Assume Dq has distinct diagonal entries di> ■ ■ ■ > dp, in which case 
M has unique eigenvectors U up to sign. Let U in (29) be chosen to mini- 
mize \\UU' — Ip\\ and let A = log{U'U) G Ap, the set of p x p antisymmetric 
matrices. Then, as oo, the entries {aij}i^j of A are asymptotically in- 
dependent and 

^'-^•^^(°' 2(dr-rf,)0 - 

Computing U above leads to a maximization problem in Op. We solve it 
by requiring that the gradient of the objective function be orthogonal to the 
tangent space to Op. For this, we use the fact that the tangent vectors U at 
any point U £ Op are of the form U = UA where A £ Ap, the set of p x p 
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antisymmetric matrices (Chang [5], Edelman, Arias and Smith [10], Lang 
[18], Moakher [25]). Part (ii) of the theorem gives the asymptotic distribution 
of the error incurred in the estimation of U . However, instead of using the 
Euchdean difference U — U io measure the error, we use the Riemannian 
logarithmic map of tJ to the tangent space to Op at U , which results in the 
tangent vector A = \og{U'tJ) ^ Ap (Edelman, Arias and Smith [10], Lang 
[18], Moakher [25]). Notice that the variance of aij increases without bound 
as di and dj get closer. This happens because the curvature of on the 
plane di + dj = constant increases without bound as di and dj get closer. In 
the limit when di = dj , we get one eigenvalue with multiplicity two and the 
dimension of the set abruptly goes down by 1. In Figure 2, this phenomenon 
can be seen as the collapse of the circle into its center. The result is that 
estimates of eigenvectors become increasingly variable as the corresponding 
eigenvalues get close, and unidentifiable if they are equal. 

Proof of Theorem 4.1. (i) Mdo is parallel to Ip. Thus by Theorem 
3.1, [/is the minimizer of 

(30) g{U) = tr[(y - UDoU'f] = iT{Y^) - 2ti{U'YUDo) + tr(L»g) 

subject to UU' = Ip. The critical points of g{U) with respect to the con- 
straint U'U = Ip are those points U where the gradient is orthogonal to the 
surface U'U = Ip. The tangent vectors to that surface at U are of the form 
U = UA., where A e Ap. To see this, trace a curve Q{t) G 0{p) such that 
(5(0) = U . Differentiating U'U = Ip with respect to t and evaluating at t = 
gives U'U + U'U = 0, so ^ = U'U G Ap. 

Using matrix derivative rules (Fang and Zhang [12], page 16) we get that 
the gradient of g{U) is dg{U)/dU = — AYUDq. The gradient must satisfy 

(^^ ^) = -^tr[{UA)'YUDo] = -AtT{A'U'YUDo) = 
\ oU / u=u 

for all A G Ap. It is easy to check that the space Ap is orthogonal to 
Sp, that is, tr{AB) = for all A G Ap and B G Sp. Thus we must have 
that U'YUDq is symmetric. This is satisfied hy U = VQ, where Q is any 
orthogonal matrix such that QDq = DqQ. Plugging back into (30) gives 
g{U) = tr[(y - VQDoQ'V'f] = tr[(A - Dq)^], which is minimized if the 
eigenvalues in A are chosen in decreasing order. Of course, eigenvalues in 
A corresponding to sets of repeated eigenvalues in Dq can be permuted 
without affecting the minimization. 

(ii) Assume momentarily that the covariance structure is spherical (r = 
0). The definition A = log{U'U) provides a parametrization of SOp (the 
set of orthogonal matrices of determinant 1) given hy U = U exp{A), while 
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at the same time A parametrizes the tangent space to SOp at U. Minus 
twice the maximized log-hkehhood is ng[U). To get the covariance of A, 
we need the score function, which is the gradient of ng{U). To obtain it, 
trace a curve Q{t) = Uexp{At) passing through U in the direction A. Let 
W = U'YU. The derivative of the log-hkehhood in the direction A is the 
derivative of ng{Q{t)) with respect to t evaluated at i = 0: 

2n|tr(e^Ve^*I))|,=o 

2niT{e^''A'We^'D + e^' Ae^' D)\t=Q 
2ntr{A'WD + WAD) = 2ntj:[A'{DW - WD)]. 

This is the inner product of A with the antisymmetric matrix S = 2n{DW — 
WD). Thus S is the score function in the standard basis for Ap and Sp. It 
is easy to see that the entries of S are sij = 2n{di — dj)wij for i ^ j. Since 
W = U'YU ~ Npp{D,a^ /n), the off-diagonal entries of W are independent 
A'^(0, (T^/(2n)). Thus the sij are independent normal with mean and vari- 
ance var(sjj) = An{di — dj)'^ vai{wij) = 2n{di — dj)'^a^. This is the Fisher 
information with respect to aij. Thus, asymptotically, the entries \fndij are 
independent and each normal with mean zero and variance u^/ (2{di — dj)'^). 

For the above calculations we assumed r = 0. However, the tangent space 
to is orthogonal to Tp according to Definition 3.1. By Lemma 3.1, all 

the inner products computed above are the same if r 7^ 0. Therefore the 
asymptotic distribution obtained above is also the same if r 7^ 0. □ 

We can now compute the LLR statistics for tests (SI) and (S2). 

Corollary 4.1 [Case (SI)]. Consider the test Hq:M = Mq vs. Me 
A4 Do ■ Let Y = VAV' be an eigendecomposition of Y so that A and Dq are 
in decreasing order. Given a and t, the LLR test statistic is 

(31) T = ^MADo) - tr(yMo)] "^^x' f 9 - ^ E + 1) 

independent of r . 

Proof. Here Mq = Mq and Ma = VDqV'. Replacing these in (18b) 
gives T = 2n{Y ,VDoV' — Mq)^2^^. But the second argument of the inner 
product has zero trace so the r-term drops. Thus T = 2niv\Y iy DqV' — 
Mo)]/(T^, which equals (31). The number of degrees of freedom is equal to 
the dimension of minus zero for the single point Mq. □ 



dg{Q{t)) 



dt 



t=Q 
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The form of the test statistic (31) is interesting. Recall that (SI) is a test 
of whether Mq has eigenvectors Uq when the eigenvalues Dq are assumed 
known. If the eigenvectors VofY are equal (up to sign) to the eigenvectors 
Uq of Mq, then T is equal to zero. As the angles between the eigenvec- 
tors in V and Uq increase, the inner product between Y and Mq decreases, 
which increases the value of the test statistic for fixed eigenvalues A and 
1^0 • Another way to see the dependence on the eigenvectors is to rewrite T 
as T = 2n[tr(AL»o) - tv{AVDQV')]/a^ , where V = V'Uq. The second term 
in the brackets measures the angles between the eigenvectors V and Uq 
weighted by the eigenvalues. Here we can see how the multiplicities in Dq 
play a crucial role. As the eigenvalues of Dq get closer, the angles between 
the corresponding eigenvectors in V and Uq become harder to detect, and 
are unidentifiable if there are exact multiplicities. 

It is also not surprising that T in (31) does not depend on r. All the 
variability in T is due to the variability of the MLE on Ai Da , which is 
orthogonal to Ip. 

Corollary 4.2 [Case (S2)]. Consider the test Hq:M eM Dq vs. Ha ■ M ^ 
M Do ■ Let Y = VAV' he an eigendecomposition of Y so that A and Dq are 
in decreasing order. Given a and t, the LLR test statistic is 

(32) T = nil A - DqWI,^^ "^x' (^^ E ^^(^^ + 1)) • 

Proof. In the notation of Lemma 3.4, Mq = VDqV' by Theorem 4.1, 
and ]Ma = Y. Replacing these in (18a) gives (32). Under Hq, M.Do has 
dimension dim{J^Do) = 9 — Z)i=i "T'j(™'i + Under the dimension is 
q, so the number of degrees of freedom for the test is Y^^=ii^iijni + l)/2- 
□ 

Since this is a test of whether Mq has eigenvalues Dq while treating the 
eigenvectors as nuisance parameters, it is not surprising that the test statistic 
(32) is equal to the normalized distance between Dq and the eigenvalues of 
Y. 

4.4. Curved submanifolds [Case (S3)]. We have the embedded subman- 
ifold Almi,...,m.fe given by (5), the set of matrices with unspecified k<p 
distinct eigenvalues di, . . . ,dk and corresponding multiplicities mi, . . . , ruk- 
The eigenvalues are assumed to have a known order. For example, a prolate 
matrix [di > d2 = d^) is different from an oblate matrix {di = d2> d^), even 
though both have the same number of multiplicities. The dimension of the 
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set M mi,...,mj; is the dimension of the set defined in Lemma 4.1 plus k for 
the free eigenvalues, in total 

k 

(33) dim{Mmi,...,mk) =k + q- ^^mi{mi + 1). 

i=l 

The MLE of M is given by Theorem 4.2 below. For this we need the following 
definition. 

Definition 4.1. Define the cumulative multiplicities cq = 0, ej = J2i=i 
j = 1, . . . , A;, so that = J2i=i = P- We define the block average of a diag- 
onal matrix A according to the ordered sequence of multiplicities mi, . . . , m^, 
denoted blkmi,...,mfe(A) G Pp, as the block diagonal matrix formed by par- 
titioning A into diagonal blocks of sizes mi, . . . ,mk in that order and re- 
placing the diagonal entries in each block by the average of the diagonal 
entries in that block. Specifically, for j = 1, . . . ,k, the jth diagonal block 

of blkm^^...^mj. (A) is {i/mj)J2'iLe _i+i^ilmj- Below we use the shorthand 
notation blk(A) when the multiplicities are understood from the context. 

Theorem 4.2. Let Mmi,...,mk be given by (5) and letY = VAV' be an 
eigendecomposition with eigenvalues in decreasing order. The MLE of M is 
M = UDU' where U = VQ and Q is block diagonal orthogonal with orthog- 
onal diagonal blocks Qi, . . . , Qk of sizes mi, . . . , m^ and D = blkmi,...,™^ (A) 
given by Definition 4-1- 

Proof. The set Mmi,...,m,, is parallel to Tp. By Theorem 3.1, the MLE 
minimizes g{U,D) = tr[(y — UDU')"^]. Given D, the minimization with re- 
spect to U is the same as in Theorem 4.1. Notice that the result does not 
depend on the actual entries of D but only on their multiplicities. To find the 
MLE of -D, choose Q = Lp so that U = V, where the eigenvectors correspond 
to eigenvalues in decreasing order. Then 

g{U,D)=tr[{Y-VDV'f]=tv[{A-Df]=j2 ^ {Xi-djf. 

j=li=ej-i+l 

Taking the derivative with respect to any particular eigenvalue dj and equat- 
ing to zero gives dj = (1/mj) X^iie^^i+i which gets repeated over the 
block of size mj. □ 

The solution to the test of whether M is isotropic or partially isotropic 
with particular multiplicities of its eigenvalues is given by the next corol- 
lary. The test statistic is the distance between the eigenvalues of Y and the 
estimated eigenvalues under the assumption of the particular multiplicities. 
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Corollary 4.3 [Case (S3)]. Consider the test Hq: M e Mmi,...,m^, vs. 
Ha ■ M ^ Mmi,...,mk ■ Given a and t, the LLR test statistic is 

(34) T = ^ tr [(A - blk(A))2] "-^ E mim, + l)-kj. 

Proof. Replacing the MLE from Theorem 4.2 for Mq in (18a) and 
Ma = Y gives T = n\\A — blk(A)||^2 ^- But the argument of the norm has 
trace zero, yielding (34). The number of degrees of freedom is equal to q 
minus the dimension of Mmi,...,mki which is given by (33). □ 

5. Inference for eigenvalues and eigenvectors in the two-sample problem. 

As in the one-sample case, we assume a and r are known. If not, adjustments 
for unknown a and r can be made based on Lemma 3.6. The unrestricted 
case (AO) was covered in Example 3.2. We proceed with cases (SI) and (S2). 

In order to solve the LLR statistics for these cases, we first need to find 
the MLEs of Mi and M2 in A42,D- The solution is given in Theorem 5.1 
below. In what follows, let Yi = FiAiF/, Y2 = V2A2V^ and Y = VAV be 
eigendecompositions, all with eigenvalues in decreasing order. 

Theorem 5.1. Let Ai2,D be given by (7) and suppose the eigenvalues 
di > ■ ■ ■ > dk of D have known multiplicities mi, . . . , ?Tifc. The MLEs of M\ 
and M2 in M2,d are Mi = UiDU[ and M2 = U2DU2 where: 

(i) Ui and U2 are any matrices of the form Ui = ViQi and IJ2 = y2Q2, 
where Qi and Q2 are orthogonal matrices such that QiDQ'i = D and Q2DQ2 = 
D, that is, block diagonal orthogonal with orthogonal blocks of sizes ?ni, . . . , m^. 

(ii) Let A = {niAi + n2A2) /n. Then D = hlkmi,...,mkiA) given by Defini- 
tion 4-T 

Even though D is unspecified, it turns out that the MLE, while it does 
not depend on the actual value of D, does depend on the multiplicities of 
the eigenvalues of D, because that affects the dimension of the set J^2,D- 
Since this set is parallel to {Ip,Ip), the MLE does not depend on a or r. 

Proof of Theorem 5.1. (i) M2,d is parallel to {Ip,Ip). By Lemma 
3.7 and Theorem 3.2, the MLE minimizes 

(35) g{Ui,U2,D) = nitr[{Yi - UiDU[)''] + n2tr[iY2 - U2DU^f]. 

Given D, each summand is minimized separately as in Theorem 4.1, yielding 
Ui and U2- Notice that the result does not depend on the actual entries of 
D but only on their multiplicities. 
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(ii) The MLE of D minimizes (35). For the MLEs of Ui and U2, choose 
Qi = Q2 = Ip so that Ui = Vi and U2 = V2, where these eigenvectors corre- 
spond to eigenvalues in decreasing order. Replacing these in (35) gives 

g{Ui ,U2,D) = ni tr [(Ai - Df] + tr [(A2 - Df] 

k k ej 

j=li=ej_i+l j=li=ej_i+l 

Taking the derivative with respect to a particular eigenvalue dj and equating 
to zero gives 

1 4^ niXiA+n2X2A 



dj = — E 



^0 i=e"+l ™ □ 

We can now solve the LLR statistics. 

Corollary 5.1 [Case (SI)]. Consider the test HQiMe M2,d vs. Ha ■ M ^ 
A^2,D; where M2,D is given by (7). Given a and t, the LLR test statistic is 

(36) T = ^ II Ai - A2 11^,^^ + ^ tr[(A - blk(A))2] 

and is asymptotically x^(Si=i "m-iin^i + 1) ~ ^) under Hq as 00. 

Proof. Under Hq, the MLEs are given by Theorem 5.1. Under Ha, 
both the eigenvalues and eigenvectors are unrestricted, and so the MLEs are 
simply Mi^A = Yi and M2^a = ^2- By Lemma 3.7, the LLR test statistic is 

r = mil 14 All// - ^ibik(A)yi'||22_^ + n2||y2A2F2 - "^"2bik(A)y2'||^2_, 
= mllAi - blk(A)||22^^ + n2||A2 - blk(A)||22^^ 

= ^l|Ai - A2||22_^ + n||A - blk(A)||22_^, 

where we have applied the equivalent form (21b) of the norm. The expression 
simplifies to (36) by noting that the second summand has trace zero. Under 
Hq, by Lemma 4.1, the parameter set has dimension dim(A^2,D) = k + 2(q — 
J2i=i t^iif^i + l)/2) = k + 2q — X]i=i 'mi{mi + 1). Under Ha, the parameter 
set has dimension dim(A^^) = 2q. By Lemma 3.7, T is asymptotically 
with number of degrees of freedom dim(A^^) — dim(A^o) = J2i=i "^i(w.j + 
1) — A; degrees of freedom. □ 



The first term in the test statistic (36) resembles the test statistic for the 
unrestricted case (Example 3.2), but it involves the eigenvalues only. The 
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second term captures the effect of the multiphcities. If D has no repeats (all 
the eigenvalues have multiplicity one), then the block average of A is the 
same as A and the second term drops. The asymptotic distribution of the 
test statistic in this case is x^(p)- On ttie other hand, if D is fully isotropic 
(one eigenvalue of multiplicity p), we get a ')(^{2q — 1). 

Corollary 5.2 [Case (S2)]. Consider the test Hq:Mi = M2 vs. Ha ■ (Mi, 
M2) S A^2,D where M.2,D is given by (7). Under both Hq and H^, D is as- 
sumed to have unknown eigenvalues di> ■ ■ ■ > d^ with known multiplicities 
mi, . . . ,mk- Given a and r, the LLR test statistic is 

r = ^[tr(AiA2)-tr(yiF2)] 

(37) 

+ 4 tr[(A - blk(A))2 - (A - blk(A))2] 

and is asymptotically x^{q — Yl^=i 'f^ii'^^i + l)/2) under Hq as n — > 00. 

Proof. Under Hq, the data can be regarded as one sample of size n. 
By Theorem 4.2, the MLEs are Mi,o = M2,o = Vh\k{A)V'. Under Ha, The- 
orem 5.1 gives Mi^A = Vi blk(A)VY and M2,A = F2blk(A)y2'- By Lemma 3.7, 
replacing these MLEs in (25) gives 



T = n\\Y - y blk(A)y'||^.,, + ^11 A(y)||^.,, 
- ni II Ai - blk(A) 11^2^^ - n2|| A2 - blk(A) 

= n||A-bik(A)e.,, + !^||yi-y2||^v 

-n||A-blk(A)||^.,,-!^||Ai-A2e.,, 
n 

+ n||A-blk(A)||2,^^-n||A-blk(A)||2,^^, 

where we have used the form (21b) twice. Expression (37) follows by noting 
that tr(Ai)tr(A2) = tr(yi) tr(y2) and that the terms inside the norms have 
trace zero. 

Under Hq, by Lemma 4.1, the parameter set has dimension dim(A^o) = 
k + q — X)f=i ^1(^-1 + l)/2- Under Ha, the parameter set has dimension 
dim(7W) =k + 2{q- Eii mi{m^ + l)/2) = k + 2q - ELi mi{mi + 1). By 
Lemma 3.7, T is asymptotically with number of degrees of freedom 
dim(A^yi) — dim(A^o) = q — J2i=i ^■ii'nii + l)/2 degrees of freedom. □ 



EIGENVALUES AND EIGENVECTORS OF GAUSSIAN SYMMETRIC MATRICES33 



The form of the test statistic (37) is interesting. Recall that (S2) is a 
test of whether Mi and M2 have the same eigenvectors when they are 
assumed to have the same eigenvalues. Consider the first bracket Ti = 
tr(AiA2) — tr(yiy2)- The behavior of this term is similar to (31). If the 
eigenvectors in Vi and V2 are the same, then Ti is zero. As the angles be- 
tween Vi and V2 increase, then the inner product between Yi and Y2 de- 
creases, which increases the value of Ti for fixed eigenvalues Ai and A2. 
Another way to see the dependence on the eigenvectors is to rewrite Ti as 
Ti = tr(AiA2) — tr(AiyA2V^'); where V = V/V2. The second term measures 
the angles between the eigenvectors Vi and V2 weighted by the eigenvalues. 
Here we see how the multiplicities of the common unknown eigenvalues D 
play a crucial role. As the eigenvalues of D get closer, the angles between the 
corresponding eigenvectors in Vi and V2 become harder to detect, and are 
unidentifiable if there are exact multiplicities. The multiplicities also play a 
role in the last two terms in (37). If the eigenvectors Vi and V2 are equal (up 
to sign), then the difference between those two terms is zero. As the angles 
between Vi and V2 increase, the eigenvalues A of the pooled average decrease 
with respect to the average of the eigenvalues A, which also increases the 
value of T. 

As mentioned earlier in Section 2.2, cases (S3) and (S4) are difficult be- 
cause they involve the set M2,u given by (8), whose MLE has no closed- form 
solution. The MLE for the common matrix U £ Op in this case could po- 
tentially be found using numerical optimization techniques such as the ones 
described by Edelman, Arias and Smith [10]. A useful simplification here is 
that the set A42,u is parallel to Ip x Ip in the sense of Definition 3.2, so by 
Theorem 3.2, the MLE given a and r can be solved assuming = 1 and 
r = 0. 

6. Summary and discussion. In this article we have derived MLEs and 
LLR tests for the mean parameter M of the orthogonally invariant symmetric- 
matrix-variate normal distribution. This has been carried out for many sub- 
families of interest regarding eigenvalues and eigenvectors of M, both in the 
one-sample and two-sample settings. The parameter sets involved have been 
affine subspaces and orthogonally invariant embedded submanifolds of Sp 
and SpX Sp, in addition to one case in which the parameter set is a polyhe- 
dral convex cone in Sp. In these parameter sets, the geometry of the sets and 
the multiplicity pattern of the eigenvalues have played crucial roles. Despite 
some of the classical characteristics of this problem, conceivably it was not 
attempted before because there was no data that required it. The emergence 
of new data such as DTI and polarized CBR has made this need come to 
life. 

As pointed out by Theorems 3.1 and 3.2, the main advantage of adopting 
an orthogonally invariant covariance structure has been analytical, allow- 
ing the derivation of explicit formulas for the MLEs and LLRs. We have 
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identified a class of sets in Sp and anotlier in Sp x Sp where the MLEs are 
easy to obtain because they do not depend on the covariance parameters. 
Furthermore, Theorems 3.1 and 3.2 show that the orthogonahy invariant co- 
variance is the most general covariance structure with that property within 
those classes of sets. The explicit formulas presented in Propositions 3.1, 4.1 
and 4.2, in Theorems 4.1, 4.2 and 5.1 and in Corollaries 4.1, 4.2, 4.3, 5.1 
and 5.2 have helped gain analytical insights. But they are also useful from 
a computational point of view. Both in DTI and CBR data, images contain 
hundreds of thousands of pixels, with one random symmetric matrix at each 
pixel. For computations to be efficient, it is extremely useful to have explicit 
formulas that can be easily evaluated at each pixel. 

For real data, the normality assumption can be checked for any particular 
data set using any test of multivariate normality applied to the data vectors 
vecd(li), i = 1, . . . ,n, defined in Section 3. While the orthogonally invariant 
covariance structure has been shown to be appropriate for DTI (Basser and 
Pajevic [3]), this assumption can also be checked using the LLR test for 
orthogonal invariance given in Proposition 3.1. More ambitious data analysts 
may not want to make such assumptions on the covariance. In those cases, 
least squares solutions may be found using numerical optimization methods. 
Another option is to use the statistics derived in this paper but to adjust 
their distributions when the true covariance is not orthogonally invariant. 
This would be akin to using the standard least squares solution in a linear 
regression problem with correlated errors and then adjusting the standard 
errors of the estimates. Moreover, the MLEs in this paper can still serve as 
least squares estimators when the distribution of the data is not Gaussian. 
These lines of research are left for future work. 
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